The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Federated Learning (FL) is pervasive in privacy-focused IoT environments since it enables avoiding privacy leakage by training models with gradients instead of data. Recent works show the uploaded gradients can be employed to reconstruct data, i.e., gradient leakage attacks, and several defenses are designed to alleviate the risk by tweaking the gradients. However, these defenses exhibit weak resilience against threatening attacks, as the effectiveness builds upon the unrealistic assumptions that deep neural networks are simplified as linear models. In this paper, without such unrealistic assumptions, we present a novel defense, called Refiner, instead of perturbing gradients, which refines ground-truth data to craft robust data that yields sufficient utility but with the least amount of privacy information, and then the gradients of robust data are uploaded. To craft robust data, Refiner promotes the gradients of critical parameters associated with robust data to close ground-truth ones while leaving the gradients of trivial parameters to safeguard privacy. Moreover, to exploit the gradients of trivial parameters, Refiner utilizes a well-designed evaluation network to steer robust data far away from ground-truth data, thereby alleviating privacy leakage risk. Extensive experiments across multiple benchmark datasets demonstrate the superior defense effectiveness of Refiner at defending against state-of-the-art threats.
translated by 谷歌翻译
It is well believed that the higher uncertainty in a word of the caption, the more inter-correlated context information is required to determine it. However, current image captioning methods usually consider the generation of all words in a sentence sequentially and equally. In this paper, we propose an uncertainty-aware image captioning framework, which parallelly and iteratively operates insertion of discontinuous candidate words between existing words from easy to difficult until converged. We hypothesize that high-uncertainty words in a sentence need more prior information to make a correct decision and should be produced at a later stage. The resulting non-autoregressive hierarchy makes the caption generation explainable and intuitive. Specifically, we utilize an image-conditioned bag-of-word model to measure the word uncertainty and apply a dynamic programming algorithm to construct the training pairs. During inference, we devise an uncertainty-adaptive parallel beam search technique that yields an empirically logarithmic time complexity. Extensive experiments on the MS COCO benchmark reveal that our approach outperforms the strong baseline and related methods on both captioning quality as well as decoding speed.
translated by 谷歌翻译
Recently, vector quantized autoregressive (VQ-AR) models have shown remarkable results in text-to-image synthesis by equally predicting discrete image tokens from the top left to bottom right in the latent space. Although the simple generative process surprisingly works well, is this the best way to generate the image? For instance, human creation is more inclined to the outline-to-fine of an image, while VQ-AR models themselves do not consider any relative importance of each component. In this paper, we present a progressive denoising model for high-fidelity text-to-image image generation. The proposed method takes effect by creating new image tokens from coarse to fine based on the existing context in a parallel manner and this procedure is recursively applied until an image sequence is completed. The resulting coarse-to-fine hierarchy makes the image generation process intuitive and interpretable. Extensive experiments demonstrate that the progressive model produces significantly better results when compared with the previous VQ-AR method in FID score across a wide variety of categories and aspects. Moreover, the text-to-image generation time of traditional AR increases linearly with the output image resolution and hence is quite time-consuming even for normal-size images. In contrast, our approach allows achieving a better trade-off between generation quality and speed.
translated by 谷歌翻译
在不同模型中,对抗性示例(AES)的可传递性对于黑盒对抗攻击至关重要,在黑框对抗攻击中,攻击者无法访问有关黑盒模型的信息。但是,制作的AE总是表现出差的可转移性。在本文中,通过将AES作为模型的概括能力的可传递性,我们揭示了Vanilla Black-Box攻击通过解决最大似然估计(MLE)问题来制作AES。对于MLE,结果可能是特定于模型的本地最佳最佳,当可用数据较小时,即限制了AE的可传递性。相比之下,我们将可转移的AES重新构建为最大化后验概率估计问题,这是一种有效的方法,可以提高结果有限的结果的概括。由于贝叶斯后推断通常很棘手,因此开发了一种简单而有效的方法称为MaskBlock以近似估计。此外,我们表明该配方框架是各种攻击方法的概括版本。广泛的实验说明了面具可以显着提高制作的对抗性例子的可转移性,最多可以提高20%。
translated by 谷歌翻译
神经网络的不透明度导致其脆弱性发生后门攻击,其中触发了感染神经元的隐藏注意力,以覆盖对攻击者选择的神经元的正常预测。在本文中,我们提出了一种新型的后门防御方法,以标记和净化后门神经网络中受感染的神经元。具体来说,我们首先定义了一个名为良性显着性的新指标。通过将一阶梯度组合以保持神经元之间的连接,良性显着性可以鉴定出比后门防御中常用度量的高精度的感染神经元。然后,提出了一种新的自适应正则化(AR)机制,以通过微调来帮助净化这些被鉴定的感染神经元。由于能够适应不同参数幅度的能力,与神经元纯化中的共同正则化机制相比,AR可以提供更快,更稳定的收敛性。广泛的实验结果表明,我们的方法可以消除具有可忽略的性能降解的神经网络中的后门。
translated by 谷歌翻译
The interaction and dimension of points are two important axes in designing point operators to serve hierarchical 3D models. Yet, these two axes are heterogeneous and challenging to fully explore. Existing works craft point operator under a single axis and reuse the crafted operator in all parts of 3D models. This overlooks the opportunity to better combine point interactions and dimensions by exploiting varying geometry/density of 3D point clouds. In this work, we establish PIDS, a novel paradigm to jointly explore point interactions and point dimensions to serve semantic segmentation on point cloud data. We establish a large search space to jointly consider versatile point interactions and point dimensions. This supports point operators with various geometry/density considerations. The enlarged search space with heterogeneous search components calls for a better ranking of candidate models. To achieve this, we improve the search space exploration by leveraging predictor-based Neural Architecture Search (NAS), and enhance the quality of prediction by assigning unique encoding to heterogeneous search components based on their priors. We thoroughly evaluate the networks crafted by PIDS on two semantic segmentation benchmarks, showing ~1% mIOU improvement on SemanticKITTI and S3DIS over state-of-the-art 3D models.
translated by 谷歌翻译
Outcome prediction is crucial for head and neck cancer patients as it can provide prognostic information for early treatment planning. Radiomics methods have been widely used for outcome prediction from medical images. However, these methods are limited by their reliance on intractable manual segmentation of tumor regions. Recently, deep learning methods have been proposed to perform end-to-end outcome prediction so as to remove the reliance on manual segmentation. Unfortunately, without segmentation masks, these methods will take the whole image as input, such that makes them difficult to focus on tumor regions and potentially unable to fully leverage the prognostic information within the tumor regions. In this study, we propose a radiomics-enhanced deep multi-task framework for outcome prediction from PET/CT images, in the context of HEad and neCK TumOR segmentation and outcome prediction challenge (HECKTOR 2022). In our framework, our novelty is to incorporate radiomics as an enhancement to our recently proposed Deep Multi-task Survival model (DeepMTS). The DeepMTS jointly learns to predict the survival risk scores of patients and the segmentation masks of tumor regions. Radiomics features are extracted from the predicted tumor regions and combined with the predicted survival risk scores for final outcome prediction, through which the prognostic information in tumor regions can be further leveraged. Our method achieved a C-index of 0.681 on the testing set, placing the 2nd on the leaderboard with only 0.00068 lower in C-index than the 1st place.
translated by 谷歌翻译
为了构建推荐系统,不仅考虑用户 - 项目交互表示为序数变量,而且还利用了描述用户之间关系的社交网络,我们开发了一个层次结构的贝叶斯模型称为序数图因子分析(OGFA),该模型共同对用户建模 - 项目和用户 - 用户交互。 OGFA不仅可以实现良好的建议性能,而且还提取与代表性用户偏好相对应的可解释潜在因素。我们进一步将OGFA扩展到Oldinal Graph Gamma信念网络,该网络是一个多策略层的深层概率模型,可在多个语义级别捕获用户的偏好和社交社区。为了有效的推断,我们开发了一种并行的混合吉布斯 - EM算法,该算法利用了图的稀疏性,可扩展到大数据集。我们的实验结果表明,所提出的模型不仅在具有明确或隐式反馈的推荐数据集上的最新基准,而且还提供了可解释的潜在表示。
translated by 谷歌翻译
对抗性训练(AT)已被证明是将强大的对抗性鲁棒性引入深层神经网络的有效方法。但是,AT的高计算成本禁止在Federated Learning(FL)应用程序中使用有限的计算能力和较小的记忆足迹,例如,在资源受限的边缘设备上部署大规模的AT。以前很少有研究试图同时解决这些限制。在本文中,我们提出了一个名为Federated对抗性解耦学习(vade)的新框架,以启用FL中的资源受限的边缘设备。淡入淡出通过将解耦贪婪学习(DGL)应用于联合的对抗训练来减少计算和内存使用量,以便每个客户在每个通信回合中只需要在整个模型的一个小模块上执行。此外,我们通过添加辅助重量衰减来减轻客观不一致并实现更好的性能来改善香草DGL。 Fade为对抗性鲁棒性和融合提供了理论保证。实验结果还表明,淡出可以显着减少与完全关节训练保持几乎相同的准确性和鲁棒性的同时消耗的计算资源。
translated by 谷歌翻译